Construction and Analysis of Word-level Time-aligned Simultaneous Interpretation Corpus
نویسندگان
چکیده
Abstract In this paper, quantitative analyses of the delay in Japanese-to-English (J-E) and English-to-Japanese (E-J) interpretations are described. The Simultaneous Interpretation Database of Nagoya University (SIDB) was used for the analyses. Beginning time and end time of each word were provided to the corpus using HMM-based phoneme segmentation, and the time lag between the corresponding words was calculated as the word-level delay. Word-level delay was calculated for 3,722 pairs and 4,932 pairs of words for J-E and E-J interpretations, respectively. The analyses revealed that J-E interpretation have much larger delay than E-J interpretation and that the difference of word order between Japanese and English affect the degree of delay.
منابع مشابه
Collection of a Simultaneous Translation Corpus for Comparative Analysis
This paper describes the collection of an English-Japanese/Japanese-English simultaneous interpretation corpus. There are two main features of the corpus. The first is that professional simultaneous interpreters with different amounts of experience cooperated with the collection. By comparing data from simultaneous interpretation of each interpreter, it is possible to compare better interpretat...
متن کاملConstruction and utilization of bilingual speech corpus for simultaneous machine interpretation research
This paper describes the design, analysis and utilization of a simultaneous interpretation corpus. The corpus has been constructed at the Center for Integrated Acoustic Information Research (CIAIR) of Nagoya University in order to promote the realization of the multi-lingual communication supporting environment. The size of transcribed data is about 1 million words, and the corpus would deserve...
متن کاملConstruction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation
Abstract With the development of speech and language processing, speech translation systems have been developed. These studies target spoken dialogues, and employ consecutive interpretation, which uses a sentence as the translation unit. On the other hand, there exist a few researches about simultaneous interpreting, and recently, the language resources for promoting simultaneous interpreting r...
متن کاملSemi-Automatic Bilingual Corpus Creation with Zero Entropy Alignments
In this paper, we describe a model for aligning books and documents from bilingual corpus with a goal to create “perfectly” aligned bilingual corpus on word-to-word level. Presented algorithms differ from existing algorithms in consideration of the presence of human translator which usage we are trying to minimize. We treat human translator as an oracle who knows exact alignments and the goal o...
متن کاملVocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کامل